Eigenspace-based speaker adaptation methods in Persian speech recognition systems

نویسنده

  • Z. Ansari
چکیده

Among speaker adaptation algorithms, eigenvoice (EV) and eigenspace-based MLLR (EMLLR) adaptation approaches have been proposed for rapid adaptation with very limited adaptation data. In these methods, a speaker adapted model is constrained to be a weighted combination of some orthogonal basis vectors. In this manner, both the number of parameters to be estimated from the adaptation data, and the required adaption data dramatically decrease. Although these two algorithms have an acceptable performance for adaptation data in the range of 5 to 10 seconds of speech wave, availability of a large amount of adaption data does not necessarily lead to more efficient models. Experimental results of applying EV and EMLLR adaptation algorithms on FARSDAT database discussed in the paper show that by a limited supervised adaptation data (5-10 seconds), these methods lead to respectively 5.9% and 5.3% improvement in phoneme recognition rate. Furthermore, they yield about 4% improvement in unsupervised adaptation, where the common speaker adaptation methods such as MLLR, cannot work efficiently through a limited supervised or unsupervised adaptation data. In addition, in this paper, the development of EV performance in a large amount of adaptation data is achieved by segmenting the eigenspace based on model characteristics. Keywords-speaker adaptation; principal component analysis; eigenvoice; eigenspace

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

A comparative study of two kernel eigenspace-based speaker adaptation methods on large vocabulary continuous speech recognition

Eigenvoice (EV) speaker adaptation has been shown effective for fast speaker adaptation when the amount of adaptation data is scarce. In the past two years, we have been investigating the application of kernel methods to improve EV speaker adaptation by exploiting possible nonlinearity in the speaker space, and two methods were proposed: embedded kernel eigenvoice (eKEV) and kernel eigenspace-b...

متن کامل

ACOUSTIC MODEL ADAPTATION FOR AUTOMATIC SPEECH RECOGNITION AND ANIMAL VOCALIZATION CLASSIFICATION by

ACOUSTIC MODEL ADAPTATION FOR AUTOMATIC SPEECH RECOGNITION AND ANIMAL VOCALIZATION CLASSIFICATION Jidong Tao, B.Eng., M.S. Marquette University, 2009 Automatic speech recognition (ASR) converts human speech to readable text. Acoustic model adaptation, also called speaker adaptation, is one of the most promising techniques in ASR for improving recognition accuracy. Adaptation works by tuning a g...

متن کامل

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010